An Example of Hierarchical Ordination

This document describes fitting a hierarchical ordination to data, including code and the details that are needed.

The data

The data is available on figshare, but was originally collected by Ribera et al. (2001), and reanalyzed by Niku et al. (2021). It includes abundance observations of 68 ground beetles 87 sites in Scotland, along with 17 environmental variables and 20 traits. The aim to is determine how the environmental variables and traits affect composition in the ecological community, including whether they interact. The methodological question is how does hierarchical ordination help.

The abundances, traits and environment are each stored in a different matrix. First we load the data and set some constants:

library(nimble)
library(nimbleHMC)
library(coda)
library(lattice)

# Response data
Y <- t(read.csv("Beetles/Y.csv"))
colnames(Y) <- Y[2,]
Y<-Y[-c(1:2),-c(1,70:71)]
Y <- as.data.frame(apply(Y,2,as.integer))

# Environmental predictors
X <- read.csv("Beetles/X.csv")[,-c(1:5)]
X <- as.data.frame(apply(X,2,as.numeric))
X$Sampling.year <- X$Sampling.year - min(X$Sampling.year)
X$Texture <- as.factor(X$Texture)

# Traits
TR  <- read.csv("Beetles/TR.csv")
row.names(TR) <- TR$SPECIES
TR <- TR[,-c(1:3)]
# Traits to categorical
# Removing question marks, not ideal
TR[,c("CLG","CLB","WIN","PRS","OVE","FOA","DAY","BRE","EME","ACT")] <- apply(TR[,c("CLG","CLB","WIN","PRS","OVE","FOA","DAY","BRE","EME","ACT")],2,function(x)as.factor(gsub("\\?.*","",x)))

# Data standardization
X <- scale(model.matrix(~.,X))[,-1] # environmental variables
TR <- scale(model.matrix(~.,TR))[,-1] # species traits

# Constants
NSites <- nrow(Y) # number of sites
NSpecies <- ncol(Y) # number of species
NTraits <- ncol(TR) # number of traits
NEnv <- ncol(X) # number of environmental predictors

# create data lists for Nimble
dat <- list(Y = Y, X = as.matrix(X), TR = as.matrix(TR))
consts <- list(NSites = NSites, NEnv = NEnv, NTraits = NTraits, NSpecies = NSpecies)

The Model

A quick look with the mvabund R-package indicates that the Poisson mean-variance assumption is inappropriate here.
mvabund::meanvar.plot(Y, ylab="Sample variances (log scale)", xlab="Sample means (log scale)")
abline(a = 0, b = 1, col = "blue")

See Warton (2008) for more information.

So, we that the counts for each species follow a negative binomial distribution with a log link function, as we would in a standard generalized linear model. We assume that each species has a different mean abundance (i.e. for each species \(j\) we have a different intercept \(\beta_{0j}\)), and model the rest of the variation with a hierarchical ordination. This gives the following mean model on the link scale (with linear predictor \(\eta_{ij}\)):

\[ \eta_{ij} = \beta_{0j} + \boldsymbol{z}_i^\top \boldsymbol{\Sigma} \boldsymbol{\gamma}_j. \]

As with any ordination, \(\boldsymbol{z}_i\) and \(\boldsymbol{\gamma}_j\) are the site scores and species loadings, and the columns of \(\boldsymbol{Z}\) are the latent variables (which holds the site scores as the rows). Here we assume that each latent variable has a variance of one, so that \(\boldsymbol{\Sigma}\) holds the variation of the latent variables: it will typically be a diagonal matrix, and if any of the terms on the diagonal are close to zero, this suggests that latent variable has (almost) no effect. It thus provides a straightforward summary of the relative importance of that latent variable, and is similar to the square root of eigenvalues (singular values) in an eigenanalysis.

If we stopped here, this would be a standard Generalized Linear Latent Variable Model. But, here we want to model both \(\boldsymbol{z}_i\) and \(\boldsymbol{\gamma}_j\), i.e. go from modelling model groups of species responding in similar ways to sites to modelling how the traits of the species affect their responses to the environment.

As a simplification, we can think about this as simply a regression of \(\boldsymbol{z}_i\) (the site effects) against the environmental variables and \(\boldsymbol{\gamma}_j\) against the traits. In reality, it is more complicated, because \(\boldsymbol{z}_i\) and \(\boldsymbol{\gamma}_j\) are estimated in the model, so their uncertainty needs to be propagated.

Some more mathematical details are hidden away here, for those who are interested.

We denote the abundance at site \(i = 1 \dots n\) for species \(j = 1 \dots p\) as \(Y_{ij}\), or as a matrix as \(\boldsymbol{Y}\). The environmental variables are \(x_{ik}\) for the \(k = 1 \ldots K\) predictors (or \(\boldsymbol{X}\) as a matrix), and \(t = 1\ldots T\) traits as \(\boldsymbol{TR}\). We then assume:

\[Y_{ij} \sim \text{NB}(\lambda_{ij}, \phi_j),\] where \(\phi_j\) is the scale parameter for each species. Also,

\[\text{log}(\lambda_{ij}) = \eta_{ij}.\]

Consequently, \(\eta_{ij}\) is the linear predictor, which we further model with the hierarchical ordination:

\[ \eta_{ij} = \beta_{0j} + \boldsymbol{z}_i^\top \boldsymbol{\Sigma} \boldsymbol{\gamma}_j. \]

We then model \(\boldsymbol{z}_i\) (as in van der Veen et al. (2023)) and \(\boldsymbol{\gamma}_j\) hierarchically:

\[ \boldsymbol{z}_i = \boldsymbol{B}^\top\boldsymbol{x}_i + \boldsymbol{\epsilon}_i, \] and

\[ \boldsymbol{\gamma}_j = \boldsymbol{\omega}^\top\boldsymbol{TR}_{j} + \boldsymbol{\varepsilon}_j, \] where:

  • \(x_{ik}\) is the \(k^{th}\) predictor (i.e. environmental effect) at site \(i\)
  • \(\boldsymbol{B}\) with entry \(b_{kq} \sim \mathcal{N}(0,1)\) is the effect of the \(k^{th}\) predictor on the site score for the \(q^{th} = 1\ldots d\) latent variable
  • \(\boldsymbol{\epsilon}_i\) with entry \(\epsilon_{iq} \sim \mathcal{N}(0, \sigma^2_q)\) is a vector of residuals for the unexplained part of the site score
  • \(TR_{jt}\) is the \(t^{th}\) predictor (i.e. trait) for species \(j\)
  • \(\boldsymbol{\omega}_t\) with entry \(\omega_{tq}\) is the effect of the \(t^{th}\) trait on the species loading for the \(q^{th}\) latent variable
  • \(\boldsymbol{\varepsilon}_j\) with entry \(\varepsilon_{jq} \sim \mathcal{N}(0, \delta^2_q)\) is a vector of residuals for the unexplained part of the species loading
Note that the predictors are all standardized to zero mean and unit variance. We additionally place exponential priors with rate parameter one on all the scale parameters.

Implementation

We fit the model with the Nimble R-package.

All functions that we need for or after model fitting are here.
Code to generate initial values from prior distributions is here.
# 1LV function
inits <- function(consts){
  B = rnorm(consts$NEnv)
  O = rnorm(consts$NTraits)
  varepsilon = rnorm(consts$NSpecies)
  epsilon = rnorm(consts$NSites)
  list(
    BSTAR = B,
    OSTAR = O,
    epsilonSTAR = epsilon,
    varepsilonSTAR = varepsilon,
    sd.Site = rexp(1),
    sd.Species = rexp(1),
    beta0 = rnorm(consts$NSpecies),
    sd.LV = rexp(1),
    phi = rexp(consts$NSpecies)
  )
}

# 2LV function
inits2 <- function(consts){
    B = matrix(rnorm(consts$nLVs*consts$NEnv),ncol=consts$nLVs)
    B[upper.tri(B)] = 0
    O = matrix(rnorm(consts$nLVs*consts$NTraits),nrow=consts$NTraits)
    O[upper.tri(O)] = 0
    varepsilon = mvtnorm::rmvnorm(consts$NSpecies,rep(0,consts$nLVs),diag(consts$nLVs))
    epsilon = mvtnorm::rmvnorm(consts$NSites,rep(0,consts$nLVs),diag(consts$nLVs))
    list(
        BSTAR = B,
        OSTAR = O,
        epsilonSTAR = epsilon,
        varepsilonSTAR = varepsilon,
        sd.Site = rexp(consts$nLVs),
        sd.Species = rexp(consts$nLVs),
        beta0 = rnorm(consts$NSpecies),
        sd.LV = rexp(consts$nLVs),
        phi = rexp(consts$NSpecies)
    )
}
Functions for MCMC are here.
# Function to run a single chain
RunOneChain <- function(seed, dat, code, inits, consts, ToMonitor=NULL, 
                        Nburn=5e3, NIter=5.5e4, Nthin=10, block = NULL, RW = NULL, ...) {
  require(nimble)
  require(nimbleHMC)
  AllSamplers <- HMCsamplers <- c('epsilonSTAR', 'varepsilonSTAR', 'beta0', 'OSTAR', 'BSTAR',
                  'sd.Site', 'sd.Species', 'sd.LV','phi')
  
  if(!is.null(block)){
  HMCsamplers <- HMCsamplers[!HMCsamplers%in%unique(gsub("\\s*\\[[^\\)]+\\]","",c(unlist(block),unlist(RW))))]
  }
  if(is.null(ToMonitor)) {
    ToMonitor <- c("beta0", "sd.Species", "sd.Site", "sd.LV", "B", "O", 
                   "epsilon", "varepsilon", "gamma", "z", "phi")
  }
  mod <- nimbleModel(code = code, name = "HO", constants = consts, 
                     inits = inits(consts), data = dat, buildDerivs = TRUE)
  model <- compileNimble(mod)
  
  # Do HMC
    conf <- configureHMC(model, nodes = HMCsamplers, monitors = ToMonitor, print = FALSE, 
                         control=list(nwarmup=Nburn, kappa = 0.95, maxTreeDepth = 30))
    if(!is.null(block)) { 
    if(is.list(block)){
    # Use a slice everything that not being HMCed
      lapply(block, conf$addSampler, type = "AF_slice", sliceAdaptFactorInterval = Nburn)
    }else{
      # Use a slice everything that not being HMCed
      sapply(block, conf$addSampler, type = "AF_slice", sliceAdaptFactorInterval = Nburn)
    }
    }
    
    if(!is.null(RW)) { 
    if(is.list(RW)){
    # Use a slice everything that not being HMCed
      lapply(RW, conf$addSampler, type = "RW")
    }else{
      # Use a slice everything that not being HMCed
      sapply(RW, conf$addSampler, type = "RW")
    }
    }
    
  mcmc <- buildMCMC(conf)
  cmcmc <- compileNimble(mcmc, project = model)
  res <- runMCMC(cmcmc,  niter=NIter, nburnin = Nburn, thin=Nthin, 
                 nchains = 1, samplesAsCodaMCMC = TRUE, ...)
  return(res)
}

# Function to run chains in  parallel
ParaNimble <- function(NChains, ...) {
  opts <- list(...)
  if(!is.null(opts$seeds) && (length(opts$seeds) == NChains)){
    seeds <- opts$seeds
    opts <- opts[-which(names(opts)=="seeds")]
  }else{
    seeds <- 1:NChains
  }
  require(parallel)
  nimble_cluster <- makeCluster(NChains)
  
  samples <- parLapply(cl = nimble_cluster, X = seeds, ...)
  stopCluster(nimble_cluster)

  # Name the chains in the list
  chains <- setNames(samples,paste0("chain", 1:length(samples)))
  chains
}

# Function to create list of names for parameters to use a block slice sampler for
MakeBlockList <- function(consts, LVwise = TRUE){
# builds list for slice AF sampling
# with LVwise = TRUE we are LV-wise blocking B and epsilon, and O and varepsilon
# otherwise, LVs are jointly blocked
if(LVwise & consts$nLVs>1){
blockList <- c(sapply(1:consts$nLVs,
       function(x,consts){
         c(paste0("BSTAR[", 1:consts$NEnv,", ",x, "]"),
           paste0("epsilonSTAR[", 1:consts$NSites,", ",x, "]"))
         }
       ,
         consts = consts,simplify=F),
       sapply(1:consts$nLVs,
       function(x,consts){
         c(paste0("OSTAR[", 1:consts$NTraits,", ",x, "]"),
           paste0("varepsilonSTAR[", 1:consts$NSpecies,", ",x, "]"))
         }
       ,
         consts = consts,simplify=F)
       )
if(consts$nLVs>1){
# remove entries of B,O-STAR that are fixed
for(q in 2:consts$nLVs){
  blockList[[q]] <- blockList[[q]][-(1:(q-1))]
  blockList[-c(1:consts$nLVs)][[q]] <- blockList[-c(1:consts$nLVs)][[q]][-(1:(q-1))]
}
}
}else{
  blockList = list(c("BSTAR","epsilonSTAR"),c("OSTAR","varepsilonSTAR"))
}
blockList
}
Functions to process results after MCMC are here.
# Utility Function to get logical indicators for if names contains a string in v
GetInds <- function(v, names) {
  if(length(v)==1) {
    res <- grep(v, names)
  } else {
    res <- c(unlist(sapply(v, grep, x=names)))
  }
  res
}

# Function to swap signs of all variables varinds to have same sign as vartosign.
ReSignChain <- function(chain, varinds, vartosign) {
  #    MeanSign <- sign(mean(chain[  ,s])) # Might need this
  res <- t(apply(chain, 1, function(x, vs, vi) {
    Names <- names(x)[vi]
    if(any(grepl(",", Names))) {
      lvind <- gsub(".*, ", "", Names)
    } else {
      lvind <- seq_along(vs)
    }
    x[vi] <- x[vi]*sign(x[vs[lvind]])
    x
  }, vs=vartosign, vi=varinds))
  as.mcmc(res)
}

# Function to post-process chains to swap signs.
postProcess <- function(Chains, VarsToProcess, VarsToSwapBy = NULL, VarToSign=NULL, print=FALSE) {
  if(is.null(VarToSign)) VarToSign <- VarsToProcess
  SignInd <- GetInds(VarToSign, colnames(Chains[[1]]))
  # Get indicators for all variables to have their signs changed
  ProcessInds <- GetInds(VarsToProcess, names = colnames(Chains[[1]]))
  
  # Check if > 1 LV
  SeveralLVs <- any(grepl(",", colnames(Chains[[1]])[SignInd]))
  
  # Calculate variance of mean of indicator of sign: 
  #    hopefully largest is variable with most sign swapping (i.e. )
  Signs <- as.data.frame(lapply(Chains, function(mat, ind) {
    colMeans(mat[,ind]>0)
  }, ind=SignInd))
    VarSign <- apply(Signs, 1, var)
  
  if(SeveralLVs) {
    LV <- gsub(".*, ", "", colnames(Chains[[1]])[SignInd])
    #Chose variables who's sign will be used to swap other signs
    if(is.null(VarsToSwapBy)){
      VarsToSwapBy <- sapply(unique(LV), function(lv, vs) {
      vv <- vs[grep(lv, names(vs))]
      nm <- names(which(vv==max(vv)))
      if(length(nm)>1) nm <- nm[1] # probably something more sophisticated is better
      nm
    }, vs = VarSign, simplify = TRUE)
    }
    if(print) message(paste0("Swapping by ", paste(VarsToSwapBy, collapse=", ")))
    chains.sgn <- lapply(Chains, ReSignChain, vartosign=VarsToSwapBy, varinds=ProcessInds)
    
  } else { # only 1 LV
    if(is.null(VarsToSwapBy)){
    #Chose variables who's sign will be used to swap other signs
    VarsToSwapBy <- names(which(VarSign==max(VarSign)))[1]
    }
    if(print) message(paste0("Swapping by ", paste(VarsToSwapBy, collapse=", ")))
#    chains.sgn <- lapply(Chains, ReSignChain, varNames=VarsToProcess, sign.var=ind)
    chains.sgn <- lapply(Chains, ReSignChain, vartosign=VarsToSwapBy, varinds=ProcessInds)
  }
  as.mcmc.list(chains.sgn)
}

# Function to convert a variable in an MCMC chain that should be a matrix from 
# a vector to the right matrix
ChainToMatrix <- function(ch, name) {
  v <- ch[grep(paste0("^", name, "\\["), names(ch))]
  l.row <- as.numeric(gsub(".*\\[", "", gsub(",.*", "", names(v))))
  l.col <- as.numeric(gsub(".*, ", "", gsub("\\]", "", names(v))))
  mat <- matrix(0, nrow=max(l.row), ncol=max(l.col))
  for(i in seq_along(ch)) { # there must be a quicker way...
    mat[l.row[i], l.col[i]] <- v[i]
  }
  mat
}

# Function to convert a variable in an MCMC chain that should be a vector to a diagonal matrix
ChainToDiag <- function(ch, name) {
  v <- ch[grep(paste0("^", name, "\\["), names(ch))]
  mat <- diag(v)
  mat
}

# Function to rotate a latent variable, defaults to Varimax. Methods in GPArotation package
# ch: one draw from MCMC chain
# RotateBy: name of variable to rotate by (i.e. to calculate the rotation matrix for)
# SDby: name of Standard deviations
# ToRotate: Variables to rotate (e.g. epsilons, gamma/z).
# SDToRotate: Standatd devations to rotate
# rotation.fn: Function in GPArotation to calcualte rotation, or svd. Defaults to svd.
# scale: Should RotateBy be scaled by SDby before rotating? Defaults to FALSE
rotate2LV <- function(ch, RotateBy, SDby, ToRotate=NULL, SDToRotate=NULL, rotation.fn="svd", 
                      scale=FALSE) {
  require(GPArotation)
  # Function to rotate matrices & return full vector or a matrix (retmat)
  RotateVar <- function(name, vec, rotmat, retmat=FALSE) {
    mat <- ChainToMatrix(ch=vec, name=name)
    rotated <- mat%*%rotmat
    
    if(retmat) {
      res <- rotated
    } else {
      vec[grep(paste0("^", name, "\\["), names(vec))] <- c(rotated)
      res <- vec
    }
    res
  }
# Function to rotate (diagonal) matrices & return the diagonal
  RotateSD <- function(name, vec, rotmat) {
    mat <- diag(vec[grep(paste0("^", name, "\\["), names(vec))])
    rotated <- mat%*%rotmat
    
    vec[grep(paste0("^", name, "\\["), names(vec))] <- diag(rotated)
    vec
  }

  # Extract SDs and latent variables
  SDs <- ch[grep(SDby, names(ch))]
  LVmat <- ChainToMatrix(ch, RotateBy)
  if(scale) LVmat <- sweep(LVmat, 2, SDs, "*")
  
  # Calculate a rotation matrix
  if(rotation.fn!="svd"){
  loadings.r  <- do.call(rotation.fn, list(A=LVmat)) # rotate
  }else{
  rot <- svd(LVmat)$v
  loadings.r <- list(Th=rot, loadings = LVmat%*%rot)
  }
  
  # Retrieve correct scale of LVs
  if(scale){
    #also need to put LVs into here
    ch[grep(paste0("^", RotateBy, "\\["), names(ch))]  <- scale(loadings.r$loadings)
    ch[grep(paste0("^", SDby, "\\["), names(ch))] <- attr(scale(loadings.r$loadings),"scaled:scale")
  } else{
   # if LVs are not scaled by LV sds, we just add the new LV scale to LVs SDs.
     ch[grep(paste0("^", RotateBy, "\\["), names(ch))] <- scale(loadings.r$loadings)
     ch[grep(paste0("^", SDby, "\\["), names(ch))] <- attr(scale(loadings.r$loadings),"scaled:scale")*SDs
  }

  # Rotate rest of the variables
  for(nm in unique(ToRotate)) {
    ch <- RotateVar(vec=ch, name=nm, rotmat=loadings.r$Th, retmat = FALSE)
  }
   for(nm in unique(SDToRotate)) {
    ch <- RotateSD(vec=ch, name=nm, rotmat=loadings.r$Th)
  }

  ch
}

RotateChains <- function(mcmc.lst, RotateBy="z", SDby = "sd.LV", 
                         ToRotate=c("B", "epsilon", "gamma", "O", "varepsilon"), 
                         SDToRotate=c("sd.Site", "sd.Species"), ...) {
  if(SDby%in%SDToRotate)stop("'SDby' should not be present in 'SDToRotate'.")
  if(RotateBy%in%ToRotate)stop("'RotateBy' should not be present in 'ToRotate'.")
  rotated <- lapply(mcmc.lst, function(mcmc) {
    rot <- apply(mcmc, 1, rotate2LV, RotateBy=RotateBy, SDby = SDby, 
                         ToRotate=ToRotate, SDToRotate=SDToRotate, ...)
    as.mcmc(t(rot))
  })
  as.mcmc.list(rotated)
}

RescaleVars <- function(vec, ScaleBy, SDTorescale, ToRescale) {
  vec2 <- vec
  ScBy <- ChainToMatrix(vec, ScaleBy)
  SD <- apply(ScBy, 2, sd)
  
  for(nm in unique(c(ScaleBy, ToRescale))) {
      Sc.x <- ChainToMatrix(vec, nm)
      Sc.x.sc <- sweep(Sc.x, 2, SD, "/")
      vec2[grep(paste0("^", nm, "\\["), names(vec2))] <- c(Sc.x.sc)
  }
  for(nm in SDTorescale) {
    SD.x <- vec[grep(paste0("^", nm, "\\["), names(vec))]
    vec2[grep(paste0("^", nm, "\\["), names(vec2))] <- SD.x/SD
  }
  vec2
}

RescaleChains <- function(mcmc.lst, ...) {
  rescale <- lapply(mcmc.lst, function(mcmc) {
    rot <- apply(mcmc, 1, RescaleVars, ...)
    as.mcmc(t(rot))
  })
  as.mcmc.list(rescale)
}
Plotting functions.
PlotPost <- function(var, summ, varnames=NULL, ...) {
  vars <- grep(var, rownames(summ$statistics))
  if(is.null(varnames)) varnames <- rownames(summ$statistics)[grep(var, rownames(summ$statistics))]
  if(length(varnames)!=length(vars)) 
    stop(paste0("Number of variable names, ", length(varnames), "not the same as number of variables, ",
                length(vars)))
  
  plot(summ$statistics[vars,"Mean"], 1:length(vars), 
       xlim=range(summ$quantiles[vars,]), yaxt="n", ...)
  segments(summ$quantiles[vars,"2.5%"], 1:length(vars), summ$quantiles[vars,"97.5%"], 1:length(vars))
  segments(summ$quantiles[vars,"25%"], 1:length(vars), summ$quantiles[vars,"75%"], 1:length(vars), lwd=3)
  abline(v=0, lty=3)
  axis(2, at=1:length(vars), labels=varnames, las=1)
}

AddArrows <- function(coords, marg= par("usr"), col=2) {
  origin <- c(mean(marg[1:2]), mean(marg[3:4]))
  Xlength <- sum(abs(marg[1:2]))/2
  Ylength <- sum(abs(marg[3:4]))/2
  ends <- coords / max(abs(coords)) * min(Xlength, Ylength) * .8
  arrows(
    x0 = origin[1],
    y0 = origin[2],
    x1 = ends[,
              1] + origin[1],
    y1 = ends[, 2] + origin[2],
    col = col,
    length = 0.1)
  text(
    x = origin[1] + ends[, 1] * 1.2,
    y = origin[2] + ends[, 2] * 1.2,
    labels = rownames(coords),
    col = col)
}
Functions to retrieve results for the 4th corner term.
# Function to get MCMC results for reduced-rank approximated interaction term
getMCMCInt <- function(mcmc.lst, dat,...) {
  int.mcmc <- lapply(mcmc.lst, function(mcmc){
    inCoefs <- as.mcmc(t(apply(mcmc, 1, function(ch){
    B <- ChainToMatrix(ch,"B")
    O <- ChainToMatrix(ch,"O")  
    SDs <- diag(ch[grep("sd.LV",gsub("\\s*\\[[^\\)]+\\]","",names(ch)))])
    coefs <- B%*%SDs%*%t(O)
    vec <- c(coefs)
    names(vec) <- paste0(colnames(dat$X),":",rep(colnames(dat$TR),each=ncol(dat$X)))
    vec
    })))
  }
  )
  as.mcmc.list(int.mcmc)
}
Some final summary functions.
ExtractMeans <- function(mcmc.lst, name=NULL) { # this needs defensive programming
  if(is.null(name)) {
    ind <- 1:nrow(mcmc.lst$statistics)
  } else {
    ind <- grep(name, rownames(mcmc.lst$statistics))
  }
  Means <- mcmc.lst$statistics[ind,"Mean"]
  if(any(grep(",", names(Means)))) {
    Col <- gsub("]", "", unique(gsub(".*, ", "", names(Means))))
    res <- sapply(Col, function(cc, mn) {
      mn[grep(paste0(cc,"]"), names(mn))]
    }, Means)
  } else {
    res <- Means
  }
  res
}
GetMeans <- function(summ, name, d=1) {
  if(is.null(d)) d <- 1
  var <- summ$statistics[grep(paste0('^', name), rownames(summ$statistics)),"Mean"]
  matrix(var,ncol=d)
}

We start with a single dimension for simplicity, so that we can show the steps needed.

One dimensional ordination

To fit the model we need to 1) code up the likelihood, 2) specify the Markov Chain Monte carlo samplers and initial values, 3) run it. We parallelise running the MCMC so that each chain is run on a different processor. This means we need a function to run one chain, and then another to run all of the chains together.

The Nimble code for the model and likelihood is here
OneLV.nim <- nimbleCode({
  for (i in 1:NSites) {
    for (j in 1:NSpecies) {
# These three lines specify the model:   
      Y[i, j] ~ dnegbin(phi[j]^-1/(phi[j]^-1+lambda[i, j]), phi[j]^-1) # Likelihood: Y follows a NB distribution
      
      log(lambda[i, j]) <- eta[i, j] # Specify the log link function
      eta[i,j] <- beta0[j] + gamma[j]*sd.LV*z[i] # linear predictor: species + LV
    }
# Site scores: regression against environmental effects
#    the Bs are the coefficients of the environmental effects
      epsilonSTAR[i] ~ dnorm(0, sd = sd.Site) # Residual site effect
      XB[i] <- inprod(X[i, 1:NEnv],BSTAR[1:NEnv])
      zSTAR[i] <- XB[i] + epsilonSTAR[i]
  }
  
  for(j in 1:NSpecies) {
# Species effects: regression against trait effects
#    The Os are the trait effects.
      varepsilonSTAR[j] ~ dnorm(0, sd = sd.Species) # Residual
      omegaTR[j] <- inprod(TR[j, 1:NTraits],OSTAR[1:NTraits])
      gammaSTAR[j] <- omegaTR[j] + varepsilonSTAR[j]
      beta0[j] ~ dnorm(0, sd = 1) # Species means
  }
  
# Here we standardise z and gamma, so the both have a variance of 1.
#   The standard deviation of the latent variable is sd.LV.
    zmu <- mean(zSTAR[1:NSites])
    gammamu<- mean(gammaSTAR[1:NSpecies])
    StdDev.z <- sqrt(mean((zSTAR[1:NSites] - zmu)^2))
    StdDev.gamma <- sqrt(mean((gammaSTAR[1:NSpecies] - gammamu)^2))
    z[1:NSites] <- zSTAR[1:NSites]/StdDev.z
    gamma[1:NSpecies] <- gammaSTAR[1:NSpecies]/StdDev.gamma
    
# Scale other parameters
    epsilon[1:NSites] <- epsilonSTAR[1:NSites]/StdDev.z
    B[1:NEnv] <-  BSTAR[1:NEnv]/StdDev.z
    varepsilon[1:NSpecies] <- varepsilonSTAR[1:NSpecies]/StdDev.gamma
    O[1:NTraits] <-  OSTAR[1:NTraits]/StdDev.gamma

    # priors for scales
    sd.Site ~ dexp(1)
    sd.Species ~ dexp(1)
    sd.LV ~ dexp(1)

    for(k in 1:NEnv){
      BSTAR[k] ~ dnorm(0, sd = 1)  
    }
    for(tr in 1:NTraits){
      OSTAR[tr] ~ dnorm(0, sd = 1)
    }
     for(j in 1:NSpecies){
      phi[j] ~ dexp(1)
    }
})

Latent variable models are notorious for being unidentifiable, you can get the same mean abundances from different combinations of the parameters. We have to make some adjustments to the model to account for this: some of this is done in the model fitting, but for others it is easier to do it after we obtain the posterior samples.

The details of what we do to make the HO identifiable are here
  • First, we standardise \(\boldsymbol{z}_i\) and \(\boldsymbol{\gamma}_j\) to unit variance per latent variable to prevent scale invariance
  • At this point the model is still invariant to sign switching: because \(z_{i}\gamma_{j} = (-z_{i})(-\gamma_{j})\) in the MCMC algorithm can (and does) switch signs mid run. We could solve it by placing a truncated normal prior on the main diagonal entries of \(\boldsymbol{B}\) or \(\boldsymbol{\omega}\), but that results in bimodal posterior distributions for other coefficients. Instead we impose sign constraints by post-process the chains. We choose one parameter, for example one \(z_{iq}\) or \(\gamma_{jq}\), to be positive, and switch all of the other parameters based on that one. See below for the details of how we fix the signs.
We want to make one parameter positive, and ideally we want to do this to a parameter that clearly has both modes away from zero. We can identify this in an ad hoc way: for each chain for every parameter we calculate the proportion of iterations where the sign is positive, and then for every parameter we calculate the variance in that proportion. If a parameter is centered around 0 the mean proportion will be about 0.5, and will not vary much, whereas if it is some way from 0 the mean will be close to 0 or 1, and the variance will be high. Alternativley, we could look at the largest \(|p - 0.5|\), or we can visually identify such parameters from the posterior samples. There may be even better alternatives, and most of the time visual inspection of posterior density plots is most straightforward.

Finally, we can run the MCMC. We use a block slice sampler for the predictor effects and residual terms, a random walk sampler for the intercepts and dispersion parameters, and Hamiltonian Monte Carlo for the remaining scale parameters.

consts$nLVs = 1
chains.unsgn <- ParaNimble(4, fun = RunOneChain,
                       dat = dat,
                       code = OneLV.nim,
                       inits = inits, 
#                       Nburn = 5e1, NIter = 5.5e2, Nthin = 1, # for a small run
                       Nburn = 1e3, NIter = 2e3, Nthin = 1, # for a big HMC run
                       consts = consts, 
                       block = MakeBlockList(consts, LVwise = TRUE), RW = c(paste0("beta0[",1:consts$NSpecies,"]"),paste0("phi[",1:consts$NSpecies,"]"))) # HMC on the rest (hypers)

# post-process chains for sign-swapping
VarsToProcess <- c("^B", "^O", "^epsilon", "^varepsilon", "^z", "^gamma")
VarsToSign <- c("^B")
chains <- postProcess(chains.unsgn, VarsToProcess = VarsToProcess, 
                      VarToSign = VarsToSign, print = TRUE, VarsToSwapBy = "B[26]")
summ.HO = summary(chains)

Results

There are a lot of parameters, so a lot of results to look at. Here we concentrate on \(\boldsymbol{B}\), \(\boldsymbol{\omega}\), \(\boldsymbol{\sigma}\) and \(\boldsymbol{\delta}\), and a latent variable-plot.

We can look at the chains, to see that they have converged.

First the environmental effects:

library(basicMCMCplots)
chainsPlot(chains, var = "B", legend = F)

Then the trait effects:

chainsPlot(chains, var = "O", legend = F)

These look quite OK (after post-processing the signs). Now we can create a plot of the site scores against their indices to inspect the results.

par(mar=c(4.1, 7, 1, 1))
PlotPost(var = "B", summ = summ.HO, varnames = gsub("\\.", " ", colnames(X)), 
         xlab = "Environmental Effect", ylab = "")

We can see that all effects are quite uncertain, but that management intensity, elevation, and pH have non-zero effects effects on the latent variable. Management has a positive coefficient, elevation negative, and pH positive so beetles that have a positive species score will be more abundant in areas with more intense management, lower elevation, and more basic pH; i.e., lowlands.

We can also look at the trait effects. The bottom line is that there is too much uncertainty to say much, but LTL (beetle length) has a non-zero negative effect.

par(mar=c(4.1, 7, 1, 1))
PlotPost(var = "O", summ = summ.HO, varnames = gsub("\\.", "\n", colnames(TR)), 
         xlab = "Trait Effect", ylab = "")

We can also calculate the full distributions of site and species scores, and plot them. We see that, well, there is variation, so the trait effects are not uncertain because the species scores are uncertain: it is because their effects are small.

par(mfrow = c(2, 1), mar = c(2, 12, 4, 1))
PlotPost("^z", summ = summ.HO, varnames = NULL, ylab = "", xlab = "Latent variable 1", main = "Sites")
PlotPost("^gamma", summ = summ.HO, varnames = NULL, ylab = "", xlab = "Latent variable 1", main = "Species")

More dimensions

More than one dimension requires adding extra identifiability constraints. Now that we have two or more latent variables, we also have to worry about their rotation.

For those who are interested, this is how we deal with the rotation

There are various ways to place the constraints, some more numerically stable than others. Generally, we note that the model on the link scale is similar to existing matrix decompositions, so that much about the required constraints can be learned from those.

To prevent rotational invariance, we require both \(\boldsymbol{B}\) and \(\boldsymbol{O}\) to have zeros above the main diagonal. van der Veen et al (2023) assumed \(\boldsymbol{B}\) to be a semi-orthogonal matrix instead, as they encountered numerical instability with the constraints that we impose here. However, their constraint would require sampling on constrained parameter spaces, which is difficult, while this constraint formulation allows to use standard out-of-the-box MCMC samplers (also, it seems to work fine here). Together with the scale constraint for the latent variables there are \(d(d-1)-2K\) constraints in the model, and when \(d = K\) the number of parameters is the same in the interaction terms as for the fourth corner model by Niku et al.(2021). When \(d \gt min(K,T)\) the model can still be fitted (but either \(\boldsymbol{B}\) or \(\boldsymbol{O}\) needs to be padded with a column of zeros), but with \(d \gt max(d,K)\) there are no further predictor effects to be estimated (as the model includes the same number of parameters as the full rank term) and additional constraints will need to be introduced.
The new model code is included here
# Model code
# Update our constants from before with the new number of LVs, rest remains the same
HO.nim <- nimbleCode({
  for (i in 1:NSites) {
    for (j in 1:NSpecies) {
      eta[i,j] <- beta0[j] + sum(gamma[j,1:nLVs]*sd.LV[1:nLVs]*z[i,1:nLVs])
      log(lambda[i, j]) <- eta[i, j]
      Y[i, j] ~ dnegbin(phi[j]^-1/(phi[j]^-1+lambda[i, j]), phi[j]^-1)
    }      
    for (l in 1:nLVs) {
      XB[i, l] <- sum(X[i, 1:NEnv]*BSTAR[1:NEnv, l])
      epsilonSTAR[i,l] ~ dnorm(0,sd.Site[l])# Residual
      zSTAR[i,l] <- XB[i,l] + epsilonSTAR[i,l]
    }
  }
  
  for(j in 1:NSpecies) {
    for (l in 1:nLVs) {
      omegaTR[j, l] <- sum(TR[j, 1:NTraits]*OSTAR[1:NTraits, l])
      varepsilonSTAR[j,l] ~ dnorm(0,sd.Species[l]) # Residual
      gammaSTAR[j,l] <- omegaTR[j,l] + varepsilonSTAR[j,l]
    }
    
    beta0[j] ~ dnorm(0, sd=1)
  }
  # Constraints to 0 on upper diagonal
  # stole some code from Boral for this - thanks Francis
  for(l in 1:(nLVs-1)) { 
    for(ll in (l+1):(nLVs)) {
      BSTAR[l,ll] <- 0 
      OSTAR[l,ll] <- 0
    } 
  }

  for(l in 1:nLVs) { 
    # diagonal elements
    BSTAR[l,l] ~ dnorm(0,sd=1)
    OSTAR[l,l] ~ dnorm(0,sd=1)

    ## standardizing z and gamma
    zmu[l] <- mean(zSTAR[1:NSites,l])
    StdDev.z[l] <- sqrt(mean((zSTAR[1:NSites,l] - zmu[l])^2))
    z[1:NSites,l] <- (zSTAR[1:NSites,l]-zmu[l])/StdDev.z[l]
    
    gammamu[l] <- mean(gammaSTAR[1:NSpecies,l])
    StdDev.gamma[l] <- sqrt(mean((gammaSTAR[1:NSpecies,l] - gammamu[l])^2)) 
    gamma[1:NSpecies,l] <- (gammaSTAR[1:NSpecies,l]-gammamu[l])/StdDev.gamma[l]
    
    # Scale other parameters
    epsmu[l] <- mean(epsilonSTAR[1:NSites,l])
    varepsmu[l] <- mean(varepsilonSTAR[1:NSpecies,l])
    
    epsilon[1:NSites,l] <- (epsilonSTAR[1:NSites,l]-epsmu[l])/StdDev.z[l]
    B[1:NEnv,l] <-  BSTAR[1:NEnv,l]/StdDev.z[l]
    varepsilon[1:NSpecies,l] <- (varepsilonSTAR[1:NSpecies,l]-varepsmu[l])/StdDev.gamma[l]
    O[1:NTraits,l] <-  OSTAR[1:NTraits,l]/StdDev.gamma[l]
  
    # priors for scales
    sd.Site[l] ~ dexp(1)
    sd.Species[l] ~ dexp(1)
    sd.LV[l] ~ dexp(1)
    } 
  
  ## Free lower diagonals
  for(l in 2:nLVs) { 
    for(ll in 1:(l-1)) { 
      BSTAR[l,ll] ~ dnorm(0,sd=1)
      OSTAR[l,ll] ~ dnorm(0,sd=1)
      } 
  }
  ## All other elements
  for(l in (nLVs+1):NEnv) { 
    for(ll in 1:(nLVs)) { 
      BSTAR[l,ll] ~dnorm(0,1) 
    } 
  } 

  for(l in (nLVs+1):NTraits) { 
    for(ll in 1:(nLVs)) { 
      OSTAR[l,ll] ~dnorm(0, 1) 
    } 
  } 

     for(j in 1:NSpecies){
      phi[j] ~ dexp(1)
    }
})
We rotate the ordination after the MCMC to make it more interpretable. Here is an explanation of what is done.

Our model is:

\[ \eta_{ij} = \beta_{0j} + \boldsymbol{z}_i^\top \boldsymbol{\Sigma} \boldsymbol{\gamma}_j. \]

There are many possible rotations (see the https://cran.r-project.org/web/packages/GPArotation/index.html package for a long list). Here we will rotate based on a singular value decomposition (SVD) of the latent variables, as in the gllvm R-package Niku et al.. This rotates the ordination so that most of the variation in \(\boldsymbol{z}_i^\top \boldsymbol{\Sigma}\) is put on the first latent variable, and so that the second latent variable has the next largest amount of variance. Note, that this does not guarantee that these are the latent variables that explain most variation in the response, unlike in eigenanalysis.

In detail, for each draw from the posterior we:

  1. Multiply the LV-specific standard deviations by the column-wise standard deviation of \(\boldsymbol{\Gamma}\) and/or \(\boldsymbol{Z}\) as deemed appropriate
  2. Column-wise re-scale \(\boldsymbol{B}, \boldsymbol{E}, \boldsymbol{\Gamma}, \boldsymbol{Z}, \boldsymbol{\Omega}\) and \(\boldsymbol{\mathcal{E}}\): so that both \(\boldsymbol{Z}\) and \(\boldsymbol{\Gamma}\) have column-wise unit variance.
  3. Calculate a rotation matrix \(\boldsymbol{M}\) from \(\boldsymbol{Z}\Sigma\)
  4. Rotate \(\boldsymbol{B}, \boldsymbol{E}, \Gamma, \boldsymbol{Z}, \boldsymbol{\Omega}\) and \(\boldsymbol{\mathcal{E}}\)
  5. Column-wise (LV-wise) re-calculate the scale of \(\boldsymbol{Z}\), \(\boldsymbol{E}\), \(\boldsymbol{\mathcal{E}}\)

The first step sweeps the scale of the LVs and loadings into the LV-specific scale so it still represents the total variance of the latent variables. <>

We can run the fitting with the same code as before:

consts$nLVs <- nLVs <- 2
HO.2LV <- ParaNimble(4, fun = RunOneChain,
                       dat = dat,
                       code = HO.nim,
                       inits = inits2,
#                       Nburn=5e1, NIter=5.5e2, Nthin=1, # for a small run
                       Nburn = 1e3, NIter = 5e3, Nthin = 1, # for a full run
                       consts = consts, 
                       block = MakeBlockList(consts, LVwise=TRUE), RW = c(paste0("beta0[",1:consts$NSpecies,"]"),paste0("phi[",1:consts$NSpecies,"]"))) # HMC on the rest (hypers))

and do the post-processing with functions previously defined:

# sd.LV does not need to be rescaled, since it is estimated for var(gamma) = var(z) = 1 in the nimble code
# Re-scale gamma
chains2LV.sc <- RescaleChains(HO.2LV, ScaleBy = "gamma", 
                              SDTorescale = c("sd.Species"),
                              ToRescale = c("gamma",  "O", "varepsilon"))

# Re-scale Z
chains2LV.sc2 <- RescaleChains(chains2LV.sc, ScaleBy = "z", 
                              SDTorescale = c("sd.Site"), 
                              ToRescale = c("z",  "B", "epsilon"))

# Now rotate based on Z*Sigma and return its scale as LVsd.
# Note that rotation can lead to non-diagonal sd.Site and sd.Species
# But off-diagonals are not currently returned
chains2LV.r <- RotateChains(chains2LV.sc2, rotation.fn = "svd", scale = T)

# post-process chains for sign-swapping
chains2LV <- postProcess(chains2LV.r, VarsToProcess = VarsToProcess, VarsToSwapBy = c("1]" = "B[26, 1]", "2]" = "B[22, 2]"), print = F)

# Calculate summaries
summ.HO2LV <- summary(chains2LV)

rm(chains2LV.sc, chains2LV.sc2, chains2LV.r)

Results

First, we look at the mixing.

Environment first:

chainsPlot(chains2LV, var = "B", legend = F, ncols = 2)

And the trait effects:

chainsPlot(chains2LV, var = "O", legend = F, ncols = 2)

The LV-specific standard deviations gives an impression of relative importance of the latent variables;

SDs <- summ.HO2LV$statistics[grep("sd", rownames(summ.HO2LV$statistic)),]
rownames(SDs) <- c(paste0("Latent Variable ", 1:2),
              paste0("Site ", 1:2),
              paste0("Species ", 1:2))
knitr::kable(SDs, digits=2, format = "html", table.attr = "style='width:70%;'")
Mean SD Naive SE Time-series SE
Latent Variable 1 1.78 0.09 0 0.00
Latent Variable 2 1.00 0.06 0 0.00
Site 1 -0.02 0.22 0 0.01
Site 2 -0.01 0.12 0 0.01
Species 1 0.00 0.01 0 0.00
Species 2 0.00 0.02 0 0.00

we see that the first latent variable explains about twice as much variation than the second. We can also present this as a ratio:

Ratio <- as.mcmc.list(lapply(chains2LV,function(ch)as.mcmc(data.frame(ch=ch[,"sd.LV[2]"]/ch[,"sd.LV[1]"]))))
chainsPlot(Ratio, legend=F)

Again we examine the predictor effects in a caterpillar plot to spot non-zero predictor effects:

par(mar=c(5, 10, 4, 2) + 0.1)
PlotPost("B",summ.HO2LV,varnames = paste(rep(colnames(X),2), rep(paste0("LV.",1:2),each=ncol(X)), sep = "-"), xlab="Predictor effect", ylab=NA)

PlotPost("O",summ.HO2LV,varnames = paste(rep(colnames(TR),2), rep(paste0("LV.",1:2),each=ncol(TR)), sep = "-"), xlab="Predictor effect", ylab=NA)

From which we see that there are various non-zero effects for the environmental effects, but that generally the trait effects are still too uncertain to draw any conclusions.

However, What we are really interested in, is looking at the environment-trait interactions. We can plot those with the following code.

intChains <- getMCMCInt(chains2LV, dat = dat)
summ.EnvTraitInt <- summary(intChains)
coefs <- matrix(summ.EnvTraitInt$statistics[,1], ncol = ncol(dat$TR), nrow=ncol(dat$X), dimnames = list(colnames(dat$X),colnames(dat$TR)))
# This plotting code was retrieved from the gllvm vignette: https://cran.r-project.org/web/packages/gllvm/vignettes/vignette1.html
a <- 0.2
colort <- colorRampPalette(c("blue", "white", "red"))
levelplot(coefs, xlab = "Environmental Variables", 
                      ylab = "Species traits", col.regions = colort(100), cex.lab = 1.3, 
                      at = seq(-a, a, length = 100), scales = list(x = list(rot = 45)))

From which we can see that Moisture, elevation, and management have the most pronounced effects; the variables that are included in the analysis of this dataset by Niku et al (2021).

Now, we can make two-dimensional ordination plots of sites and species, with their predictor effects. Note that these have been rotated, so that most ofthe variance should be on the first axis.

z.m <- GetMeans(summ.HO2LV, name="^z", d=consts$nLVs)
gamma.m <- GetMeans(summ.HO2LV, name="^gamma", d=consts$nLVs)
vareps.m <- GetMeans(summ.HO2LV, name="^varepsilon", d=consts$nLVs)
eps.m <- GetMeans(summ.HO2LV, name="^epsilon", d=consts$nLVs)
B.m <- GetMeans(summ.HO2LV, name="B", d=consts$nLVs)
row.names(B.m) <- colnames(dat$X)
O.m <- GetMeans(summ.HO2LV, name="O", d=consts$nLVs)
row.names(O.m) <- colnames(dat$TR)

par(mfrow=c(2,1))
#LVs
plot(z.m,type="n", xlab="Latent variable 1", ylab="Latent variable 2", main = "Sites")
text(z.m,labels=1:consts$NSites)
AddArrows(marg = par("usr"), coords=B.m, col="red")

#gammas
plot(gamma.m,type="n", xlab="Latent variable 1", ylab="Latent variable 2", main = "Species")
text(gamma.m,labels=vegan::make.cepnames(colnames(Y)))
AddArrows(marg = par("usr"), coords=O.m, col="blue")

We can see that almost all of the environmental effects seem to act in the same way, mainly on LV1. It is useful to note that the direction of the interaction effect for the environment and traits can be read out of the ordination plot. If the resulting interaction is negative, the environment and trait arrow point in opposite directions. Likewise, if the resulting interaction is positive, the arrows point in the same direction.

Residual covariance matrix

The covariance of species is determined by traits, LV-specific variation and sites’ residual variation, and the covariance between sites is determined by the environment, LV-specific variation and species’ residual variation.

The maths behind this is covered in here

The residual covariance matrix of the model (where you calculate species associations from) is determined by three terms:

  1. \(\boldsymbol{X}\boldsymbol{B}\boldsymbol{\Sigma}\boldsymbol{\varepsilon}_j \sim \mathcal{N}(0,\boldsymbol{X}\boldsymbol{B}\boldsymbol{\Sigma}\text{diag}(\boldsymbol{\delta}^2)\boldsymbol{\Sigma}\boldsymbol{B}^\top\boldsymbol{X}^\top)\)
  2. \(\boldsymbol{TR}\boldsymbol{\omega}\boldsymbol{\Sigma}\boldsymbol{\epsilon}_i\sim \mathcal{N}(0,\boldsymbol{T}\boldsymbol{\omega}\boldsymbol{\Sigma}\text{diag}(\boldsymbol{\sigma}^2)\boldsymbol{\Sigma}\boldsymbol{\omega}^\top\boldsymbol{T}^\top)\)
  3. \(\boldsymbol{\epsilon}_i^\top\boldsymbol{\varepsilon}_j \sum \limits^{d}_{q=1}\Sigma_{q,q}\sigma_q\delta_q\sim \mathcal{K}_0(\sigma^{-1}_q\delta^{-1}_q\vert\epsilon_{iq}\varepsilon_{iq}\vert)\),

where \(\mathcal{K}_0\) denotes the zero order modified Bessel function of the second kind. The first term tells us that correlations between species are determined by the environment. The second term tells us that the correlation between sites is determined by species traits. The last term induces no correlation between sites and species for diagonal \(\boldsymbol{\Sigma}\), but serves to scale species associations. Consequently, the covariance for species \(j,j2\) and site \(i,i2\) is: \[\begin{multline} \Sigma_{i,i2,j,j2} = \text{cov}(\boldsymbol{x}_i^\top\boldsymbol{B}\boldsymbol{\Sigma}\boldsymbol{\varepsilon}_j,\boldsymbol{x}_{i2}^\top\boldsymbol{B}\boldsymbol{\Sigma}\boldsymbol{\varepsilon}_{j2}) + \text{cov}(\boldsymbol{x}_i^\top\boldsymbol{B}\boldsymbol{\Sigma}\boldsymbol{\varepsilon}_j,\boldsymbol{tr}_{j2}^\top\boldsymbol{\omega}\boldsymbol{\Sigma}\boldsymbol{\epsilon}_{i2}) + \text{cov}(\boldsymbol{x}_i^\top\boldsymbol{B}\boldsymbol{\Sigma}\boldsymbol{\varepsilon}_j,\boldsymbol{\epsilon}_{i2}^\top\boldsymbol{\Sigma}\boldsymbol{\varepsilon}_{j2}) + \text{cov}(\boldsymbol{tr}_j^\top\boldsymbol{\omega}\boldsymbol{\Sigma}\boldsymbol{\epsilon}_i,\boldsymbol{x}_{i2}^\top\boldsymbol{B}\boldsymbol{\Sigma}\boldsymbol{\varepsilon}_{j2}) + \text{cov}(\boldsymbol{tr}_{j}^\top\boldsymbol{\omega}\boldsymbol{\Sigma}\boldsymbol{\epsilon}_{i},\boldsymbol{tr}_{j2}^\top\boldsymbol{\omega}\boldsymbol{\Sigma}\boldsymbol{\epsilon}_{i2}) + \\ \text{cov}(\boldsymbol{tr}_j^\top\boldsymbol{\omega}\boldsymbol{\Sigma}\boldsymbol{\epsilon}_i,\boldsymbol{\epsilon}_{i2}^\top \boldsymbol{\Sigma}\boldsymbol{\varepsilon}_{j2}) + \text{cov}(\boldsymbol{\epsilon}_{i}^\top \boldsymbol{\Sigma}\boldsymbol{\varepsilon}_{j},\boldsymbol{x}_{i2}^\top\boldsymbol{B}\boldsymbol{\Sigma}\boldsymbol{\varepsilon}_{j2}) + \text{cov}(\boldsymbol{\epsilon}_{i}^\top \boldsymbol{\Sigma} \boldsymbol{\varepsilon}_{j},\boldsymbol{tr}_{j2}^\top\boldsymbol{\omega}\boldsymbol{\Sigma}\boldsymbol{\epsilon}_{i2})+ \text{cov}(\boldsymbol{\epsilon}_i^\top \boldsymbol{\Sigma}\boldsymbol{\varepsilon}_j,\boldsymbol{\epsilon}_{i2}^\top \boldsymbol{\Sigma}\boldsymbol{\varepsilon}_{j2}), \end{multline}\] third order terms are zero for central normal random variables, so this simplifies to: \[\begin{equation} \Sigma_{i,i2,j,j2} = \boldsymbol{x}_i^\top\boldsymbol{B}\boldsymbol{\Sigma}\text{cov}(\boldsymbol{\varepsilon}_j,\boldsymbol{\varepsilon}_{j2}) \boldsymbol{\Sigma}\boldsymbol{B}^\top\boldsymbol{x}_{i2}+ \boldsymbol{x}_i^\top\boldsymbol{B}\boldsymbol{\Sigma}\text{cov}(\boldsymbol{\varepsilon}_j,\boldsymbol{\epsilon}_{i2})\boldsymbol{\Sigma}\boldsymbol{\omega}^\top\boldsymbol{tr}_{j2} + \boldsymbol{tr}_j^\top\boldsymbol{\omega}\boldsymbol{\Sigma}\text{cov}(\boldsymbol{\epsilon}_i,\boldsymbol{\varepsilon}_{j2})\boldsymbol{\Sigma}\boldsymbol{B}^\top\boldsymbol{x}_{i2} + \boldsymbol{tr}_{j}^\top\boldsymbol{\omega}\boldsymbol{\Sigma}\text{cov}(\boldsymbol{\epsilon}_{i},\boldsymbol{\epsilon}_{i2})\boldsymbol{\Sigma}\boldsymbol{\omega}^\top\boldsymbol{tr}_{j2} + \text{cov}(\boldsymbol{\epsilon}_i^\top \boldsymbol{\Sigma}\boldsymbol{\varepsilon}_j,\boldsymbol{\epsilon}_{i2}^\top \boldsymbol{\Sigma} \boldsymbol{\varepsilon}_{j2}). \end{equation}\] Here, \(\text{cov}(\boldsymbol{\varepsilon}_j,\boldsymbol{\epsilon}_{i2}) = \text{cov}(\boldsymbol{\epsilon}_i,\boldsymbol{\varepsilon}_{j2}) = 0\), and for \(\text{cov}(\boldsymbol{\epsilon}_i^\top\boldsymbol{\Sigma}\boldsymbol{\varepsilon}_j,\boldsymbol{\epsilon}_{i2}^\top\boldsymbol{\Sigma}\boldsymbol{\varepsilon}_{j2}) = 2\text{tr}(\text{diag}(\boldsymbol{\delta}^2)\boldsymbol{\Sigma}\text{diag}(\boldsymbol{\sigma}^2)\boldsymbol{\Sigma})\). Further, \(\text{cov}(\boldsymbol{\varepsilon}_j,\boldsymbol{\varepsilon}_{j2})\) is zero for \(j \neq j2\) and \(\text{diag}(\boldsymbol{\delta}^2)\) otherwise, and similar for \(\text{cov}(\boldsymbol{\epsilon}_i,\boldsymbol{\epsilon}_{i2})\) .

Consequently, for a species association matrix where we consider the block of the covariance matrix where \(i = i2\), or for the site-to-site matrix where we consider the block \(j = j2\), we have:

\[\begin{split} \Sigma_{j,j2} &= \boldsymbol{x}_i^\top\boldsymbol{B}\boldsymbol{\Sigma}\text{cov}(\boldsymbol{\varepsilon}_j,\boldsymbol{\varepsilon}_{j2}) \boldsymbol{\Sigma}\boldsymbol{B}^\top\boldsymbol{x}_{i} + \text{cov}(\boldsymbol{\epsilon}_i^\top \boldsymbol{\Sigma}\boldsymbol{\varepsilon}_j,\boldsymbol{\epsilon}_{i}^\top \boldsymbol{\Sigma} \boldsymbol{\varepsilon}_{j2}) + \boldsymbol{tr}_{j}^\top\boldsymbol{\omega}\boldsymbol{\Sigma}\text{var}(\boldsymbol{\epsilon}_{i},\boldsymbol{\epsilon}_{i})\boldsymbol{\Sigma}\boldsymbol{\omega}^\top\boldsymbol{tr}_{j2} \\ &= \boldsymbol{x}_i^\top\boldsymbol{B}\boldsymbol{\Sigma}\text{cov}(\boldsymbol{\varepsilon}_j,\boldsymbol{\varepsilon}_{j2}) \boldsymbol{\Sigma}\boldsymbol{B}^\top\boldsymbol{x}_{i} + \boldsymbol{tr}_{j}^\top\boldsymbol{\omega}\boldsymbol{\Sigma}\text{diag}(\boldsymbol{\sigma}^2)\boldsymbol{\Sigma}\boldsymbol{\omega}^\top\boldsymbol{tr}_{j2}+ 2\text{tr}\{\text{diag}(\boldsymbol{\delta}^2)\boldsymbol{\Sigma}\text{diag}(\boldsymbol{\sigma}^2)\boldsymbol{\Sigma}\}, \end{split}\]

and

\[\begin{equation} \Sigma_{i,i2} = \boldsymbol{tr}_{j}^\top\boldsymbol{\omega}\boldsymbol{\Sigma}\text{cov}(\boldsymbol{\epsilon}_{i},\boldsymbol{\epsilon}_{i2})\boldsymbol{\Sigma}\boldsymbol{\omega}^\top\boldsymbol{tr}_{j} + \boldsymbol{x}_i^\top\boldsymbol{B}\boldsymbol{\Sigma}\text{diag}(\boldsymbol{\delta}^2)\boldsymbol{\Sigma}\boldsymbol{B}^\top\boldsymbol{x}_{i2} + 2\text{tr}\{\text{diag}(\boldsymbol{\delta}^2)\boldsymbol{\Sigma}\text{diag}(\boldsymbol{\sigma}^2)\boldsymbol{\Sigma}\}. \end{equation}\]

We can visualize those matrices here for (e.g.,) the first site and first species, respectively.

#species
spec.mat <- matrix(0,nrow=consts$NSpecies,ncol=consts$NSpecies)
Sigma <- diag(SDs[grep("Latent Variable", rownames(SDs)),1]^2)
Sigma.sp <- diag(SDs[grep("Species", rownames(SDs)),1]^2)
Sigma.si <- diag(SDs[grep("Site", rownames(SDs)),1]^2)

for(j in 1:consts$NSpecies){
  for(j2 in 1:consts$NSpecies){
    if(j==j2){
      spec.mat[j,j2] = X[1,,drop=F]%*%B.m%*%Sigma%*%Sigma.sp%*%t(B.m)%*%t(X[1,,drop=F])
    }
    spec.mat[j,j2] = spec.mat[j,j2] + TR[j,,drop=F]%*%O.m%*%Sigma%*%Sigma.si%*%Sigma%*%t(O.m)%*%t(TR[j2,,drop=F]) + 2*sum(diag((Sigma.sp%*%Sigma%*%Sigma.si%*%Sigma)))
  }
}

spec.cor.mat <- cov2cor(spec.mat)
colnames(spec.cor.mat) <- row.names(spec.cor.mat) <- colnames(Y)
corrplot::corrplot(spec.cor.mat,type = "lower",order = "AOE", main = "Species", mar = c(1,1,1,1),tl.srt=45,tl.cex = .5)

#sites
site.mat <- matrix(0,nrow=consts$NSites,ncol=consts$NSites)

for(i in 1:consts$NSites){
  for(i2 in 1:consts$NSites){
    if(i==i2){
      site.mat[i,i2] = TR[1,,drop=F]%*%O.m%*%Sigma%*%Sigma.si%*%t(O.m)%*%t(TR[1,,drop=F])
    }
    site.mat[i,i2] = site.mat[i,i2] + X[i,,drop=F]%*%B.m%*%Sigma%*%Sigma.sp%*%Sigma%*%t(B.m)%*%t(X[i2,,drop=F]) + 2*sum(diag((Sigma.sp%*%Sigma%*%Sigma.si%*%Sigma)))
  }
}

site.cor.mat <- cov2cor(site.mat)
colnames(site.cor.mat) <- row.names(site.cor.mat) <- paste("Site", 1:consts$NSites)
corrplot::corrplot(site.cor.mat,type = "lower",order = "AOE", main = "Sites", mar = c(3,3,3,3),tl.srt=45,tl.cex = .5)